cudaPackages.cuda_compat: automatically discover libnvrm* on jetsons #273389

SomeoneSerge · 2023-12-10T17:28:12Z

...in a very cursed way. PoC/not for merging. Related to #267247

This is arguably very much wrong but still up for a discussion.
CC @NixOS/cuda-maintainers @yannham @Kiskae

Description of changes

The approach is to add libnvrm*' impure location to the compat libcuda's RUNPATH. However, that doesn't work out of the box, because libnvrm* have many dependencies, including libstdc++, and unless we preload them they're looked up in the fhs locations only to fail (in a few more words)

Note that none of this is required with jetpack-nixos, because they control their libnvrm* and thus they can patchelf them

This seems to work sometimes:

$ uname -a
Linux ubuntu 5.10.104-tegra #1 SMP PREEMPT Sun Mar 19 07:55:28 PDT 2023 aarch64 aarch64 aarch64 GNU/Linux
$ nix registry pin nixpkgs github:NixOS/nixpkgs/66bd9f07d7fec5327721a3d8a315ef21ca7536a7
$ nix build -f "<nixpkgs>" --arg config '{ allowUnfree = true; cudaSupport = true; cudaCapabilities = [ "7.2" ]; cudaEnableForwardCompat = false; }' cudaPackages.cuda_compat -o cuda_compat
$ LD_LIBRARY_PATH=$PWD/cuda_compat/compat nix run -f "<nixpkgs>" --arg config '{ allowUnfree = true; cudaSupport = true; cudaCapabilities = [ "7.2" ]; cudaEnableForwardCompat = true; }' cudaPackages.saxpy
...
Runtime version: 11080
Driver version: 11080
...
$ LD_LIBRARY_PATH=$PWD/cuda_compat/compat nix-shell --arg config '{ allowUnfree = true; cudaSupport = true; cudaCapabilities = [ "7.2" ]; cudaEnableForwardCompat = true; }' -p '(python3.withPackages (ps: with ps; [ torch ]))' --run python
...
copying path '/nix/store/vywlw4kkrk52njgrd689wnw6fzwcvaws-python3.11-torch-2.1.1' from 'https://cuda-maintainers.cachix.org'...
copying path '/nix/store/3xk5x49b4ydl6k5xy9c6k4hdkg0q8h6w-python3.11-triton-2.0.0' from 'https://cuda-maintainers.cachix.org'...
...
building '/nix/store/sf7jxbqb4jss01s9g0hgzzfppkz8fq90-python3-3.11.6-env.drv'...
created 541 symlinks in user environment
Python 3.11.6 (main, Oct  2 2023, 13:45:54) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.version.cuda
'11.8'
$ nix shell nixpkgs#glibc.bin --command ldd ./cuda_compat/compat/libcuda.so
        linux-vdso.so.1 (0x0000ffff9b198000)
        libstdc++.so => /nix/store/vs8vyaymrvskn5qvbr9vsdx8267n5gjq-gcc-12.3.0-lib/lib/libstdc++.so (0x0000ffff993c0000)
        libnvrm_host1x.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm_host1x.so (0x0000ffff99390000)
        libnvrm_chip.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm_chip.so (0x0000ffff99370000)
        libnvsocsys.so => /usr/lib/aarch64-linux-gnu/tegra/libnvsocsys.so (0x0000ffff99350000)
        libnvsciipc.so => /usr/lib/aarch64-linux-gnu/tegra/libnvsciipc.so (0x0000ffff99320000)
        libnvos.so => /usr/lib/aarch64-linux-gnu/tegra/libnvos.so (0x0000ffff992f0000)
        libnvrm_sync.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm_sync.so (0x0000ffff992d0000)
        libc.so.6 => /nix/store/cv8mfy5wdfwfw4iwhdlkl4ddy8apl667-glibc-2.38-27/lib/libc.so.6 (0x0000ffff99120000)
        libnvrm_gpu.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so (0x0000ffff990b0000)
        libnvrm_mem.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm_mem.so (0x0000ffff99090000)
        libm.so.6 => /nix/store/cv8mfy5wdfwfw4iwhdlkl4ddy8apl667-glibc-2.38-27/lib/libm.so.6 (0x0000ffff98fe0000)
        libdl.so.2 => /nix/store/cv8mfy5wdfwfw4iwhdlkl4ddy8apl667-glibc-2.38-27/lib/libdl.so.2 (0x0000ffff98fb0000)
        librt.so.1 => /nix/store/cv8mfy5wdfwfw4iwhdlkl4ddy8apl667-glibc-2.38-27/lib/librt.so.1 (0x0000ffff98f80000)
        libpthread.so.0 => /nix/store/cv8mfy5wdfwfw4iwhdlkl4ddy8apl667-glibc-2.38-27/lib/libpthread.so.0 (0x0000ffff98f50000)
        libgcc_s.so.1 => /nix/store/vs8vyaymrvskn5qvbr9vsdx8267n5gjq-gcc-12.3.0-lib/lib/libgcc_s.so.1 (0x0000ffff98f10000)
        /nix/store/cv8mfy5wdfwfw4iwhdlkl4ddy8apl667-glibc-2.38-27/lib/ld-linux-aarch64.so.1 (0x0000ffff9b15b000)

Things done

Add a 👍 reaction to pull requests you find important.

SomeoneSerge · 2023-12-10T18:02:55Z

pkgs/development/cuda-modules/cuda/overrides.nix

+      libcudaExtraNeeded = [
+        "libnvos.so"
+        "libnvsocsys.so"
+        "libnvrm_sync.so"
+        "libnvos.so"
+        "libnvsciipc.so"
+        "libnvsocsys.so"
+        "libnvrm_chip.so"
+        "libnvrm_host1x.so"
+        "libstdc++.so"
+      ];


Quoting the message from matrix:

I'm wondering if we could somehow move this to nixglhost (e.g. as LD_PRELOAD) or maybe we could write and link a library that would scan /usr/lib/aarch-blahblaah/tegra and dlopen stuff in the correct order.

...presumably, this list might change at any time, i.e. our current nixpkgs revision might not be compatible with the future jetpacks

hacker1024 · 2023-12-11T03:20:13Z

Note that none of this is required with jetpack-nixos, because they control their libnvrm* and thus they can patchelf them

My approach so far has been to add the CUDA drivers from jetpack-nixos to the LD_LIBRARY_PATH with a launcher application, NixGL style. This allows CUDA to be used on Ubuntu without loading any system libraries at all, which avoids any problems due to missing dependencies or incompatible glibc versions.

As jetpack-nixos is not part of Nixpkgs, it may be worth developing this approach into an external tool like NixGL. It's also useful for OpenGL and Vulkan, for that matter.

I'm not too sure that system paths like this belong in Nixpkgs. We don't seem to make any accommodations for non-NixOS x86_64 distributions, after all. Using a launcher in place of /run/opengl-driver seems to work pretty well for this.

SomeoneSerge · 2023-12-11T14:31:25Z

My approach so far has been to add the CUDA drivers from jetpack-nixos to the LD_LIBRARY_PATH with a launcher application, NixGL style. This allows CUDA to be used on Ubuntu without loading any system libraries at all,

We're yet to see how reliable that is: if your Ubuntu came with a different l4t-core release than the nixos-jetpack you're taking the package from, your libnvrm* may not be necessarily compatible with the kernel. I don't even know what exactly they are honestly. If we were to look into reusing nixos-jetpack's l4t-core and linking these libraries directly, we'd have to just test for compatibility ourselves, over a matrix of jetpack versions for the kernel and for the libraries. Also note that the legal status of the debs is kind of unclear

EDIT: to reiterate, that's not an issue when using jetpack-nixos instead of ubuntu, because then we just know which kernel we're using

I'm not too sure that system paths like this belong in Nixpkgs. We don't seem to make any accommodations for non-NixOS x86_64 distributions, after all

This accommodates for a specific device even, and it kind of makes sense because the package (cuda_compat) is also specific to these devices.

We don't seem to make any accommodations for non-NixOS x86_64 distributions, after all

In a way we do, we allow LD_LIBRARY_PATH. In the future we might looking into even more tailored mechanisms (libc patches) to ease the use of Nixpks on FHS distributions

All of that said, the present PR is definitely not the way to go, because this approach is unmaintainable. I just wanted to show that this particular hack does work (at the moment).

hacker1024 · 2023-12-11T22:51:58Z

If we were to look into reusing nixos-jetpack's l4t-core and linking these libraries directly, we'd have to just test for compatibility ourselves, over a matrix of jetpack versions for the kernel and for the libraries.

jetpack-nixos is tied to specific JetPack versions. If we were to make a NixGL-like launcher, we could instruct users to make sure that they use the appropriate revision of jetpack-nixos for their host JetPack version. Mismatched configurations would be untested and not explicitly supported.

your libnvrm* may not be necessarily compatible with the kernel.

The recently released JetPack 6 Developer Preview has explicit support for upstream kernels. I don't think kernel versions will be of too much concern due to this, as NVIDIA would presumably need to keep ABIs between their kernel and userspace drivers fairly stable so that custom kernels don't constantly break.

Also note that the legal status of the debs is kind of unclear

That's a good point, but if Anduril are happy with it I'm not terribly concerned. This would be a separate tool, so no issues with Nixpkgs.

SomeoneSerge · 2023-12-11T23:32:15Z

I don't think kernel versions will be of too much concern due to this

I meant the kernel modules that libcuda and libnvrm* may be interacting with

samuela · 2023-12-14T00:57:44Z

Marking this PR as draft since it sounds like it is intended to be a WIP for the time being. But feel free to adjust as appropriate

…etsons

...from NixOS/nixpkgs#273389

SomeoneSerge · 2023-12-19T18:43:20Z

The wrapper approach (a la numtide/nix-gl-host#10) is preferable because if/when NVidia changes libnvrm*'s dependencies we can just update the wrapper, without rebuilding anything in Nixpkgs

SomeoneSerge requested a review from ConnorBaker December 10, 2023 17:28

SomeoneSerge force-pushed the feat/cuda-compat-use-fhs-libnvrm branch from b5831fd to 66bd9f0 Compare December 10, 2023 17:33

SomeoneSerge commented Dec 10, 2023

View reviewed changes

SomeoneSerge added the 6.topic: cuda Parallel computing platform and API label Dec 10, 2023

ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Dec 10, 2023

samuela marked this pull request as draft December 14, 2023 00:56

cudaPackages.cuda_compat: automatically discover libnvrm* on nvidia j…

8873577

…etsons

SomeoneSerge force-pushed the feat/cuda-compat-use-fhs-libnvrm branch from 66bd9f0 to 8873577 Compare December 19, 2023 15:27

SomeoneSerge added a commit to SomeoneSerge/pkgs that referenced this pull request Dec 19, 2023

pkgsXavier: reuse the impure cuda_compat deps hack

cf92f42

...from NixOS/nixpkgs#273389

SomeoneSerge mentioned this pull request Dec 19, 2023

cudaPackages: improve the handling of cuda_compat #273797

Open

wegank added the 2.status: merge conflict This PR has merge conflicts with the target branch label Mar 20, 2024

SomeoneSerge mentioned this pull request Jul 4, 2024

[Tracking] Agree on a name for hardware.graphics #323675

Open

wegank added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jul 4, 2024

SomeoneSerge closed this Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cudaPackages.cuda_compat: automatically discover libnvrm* on jetsons #273389

cudaPackages.cuda_compat: automatically discover libnvrm* on jetsons #273389

SomeoneSerge commented Dec 10, 2023 •

edited

Loading

SomeoneSerge Dec 10, 2023 •

edited

Loading

hacker1024 commented Dec 11, 2023

SomeoneSerge commented Dec 11, 2023 •

edited

Loading

hacker1024 commented Dec 11, 2023

SomeoneSerge commented Dec 11, 2023

samuela commented Dec 14, 2023

SomeoneSerge commented Dec 19, 2023

cudaPackages.cuda_compat: automatically discover libnvrm* on jetsons #273389

cudaPackages.cuda_compat: automatically discover libnvrm* on jetsons #273389

Conversation

SomeoneSerge commented Dec 10, 2023 • edited Loading

Description of changes

Things done

SomeoneSerge Dec 10, 2023 • edited Loading

Choose a reason for hiding this comment

hacker1024 commented Dec 11, 2023

SomeoneSerge commented Dec 11, 2023 • edited Loading

hacker1024 commented Dec 11, 2023

SomeoneSerge commented Dec 11, 2023

samuela commented Dec 14, 2023

SomeoneSerge commented Dec 19, 2023

SomeoneSerge commented Dec 10, 2023 •

edited

Loading

SomeoneSerge Dec 10, 2023 •

edited

Loading

SomeoneSerge commented Dec 11, 2023 •

edited

Loading